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DESCRIPTION 

PATTERN IDENTIFICATION METHOD, APPARATUS, AND PROGRAM 



TECHNICAL FIELD 
5 The present Invention relates to a method, 

apparatus, and program for identifying the pattern of 
an input signal by hierarchically extracting features 
in, e.g., image recognition or voice recognition. 
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10 BACKGROUND ART 

There is a technique which identifies the pattern 
of an input signal by hierarchically extracting 
features. This method extracts a high-order feature by 
using features which form the feature to be extracted 

15 and have orders lower than that of the feature to be 
extracted. Accordingly, the method has the 
characteristic that it can perform robust 
identification for the variance of an identification 
pattern. However, to increase the robustness against 

20 the variance of a pattern, it is necessary to increase 
the number of types of features to be extracted, and 
this increases the processing cost. If the number of 
types of features to be extracted is not increased, the 
possibility of identification errors increases. 

25 To solve the above problems, the following 

pattern recognition method is proposed. First, feature 
vectors of patterns of individual classes are arranged 
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in descending order of vector component dispersion to 
form dictionary patterns, and feature vectors are 
generated from an input pattern. Then, matching with 
dictionary patterns of high orders up to the Nth-order 
5 is performed. On the basis of the matching result, 
matching with lower orders is performed. In this 
manner, the processing cost can be reduced. 

The following pattern recognition dictionary 
formation apparatus and pattern recognition apparatus 

10 are also proposed. First, feature vectors are 

extracted from an input pattern, and classified into 
clusters in accordance with the degree of matching with 
the standard vector of each cluster. Category 
classification is then performed in accordance with the 

15 degree of matching between category standard vectors in 
the classified clusters of the input pattern and the 
feature vectors. Consequently, the cost of the 
matching process can be reduced. 

20 DISCLOSURE OF INVENTION 

It is, however, being desired to perform pattern 
recognition capable of performing robust identification 
for the variance of an input pattern, and reducing the 
processing cost while decreasing the possibility of 
25 identification errors. 

To solve the above problems, according to the 
present invention, a pattern identification method of 
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identifying a pattern of input data by hierarchically 
extracting features of the input data comprises a first 
feature extraction step of extracting a feature of a 
first layer, an analysis step of analyzing a 
5 distribution of a feature extraction result in the 
first feature extraction step, and a second feature 
extraction step of extracting a feature of a second 
layer higher than the first layer on the basis of the 
distribution analyzed in the analysis step. 

10 According to another aspect of the present 

invention, a pattern identification apparatus for 
identifying a pattern of input data by hierarchically 
extracting features of the input data comprises first 
feature extracting means for extracting a feature of a 

15 first layer, analyzing means for analyzing a 

distribution of a feature extraction result obtained by 
the first feature extracting means, and second feature 
extracting means for extracting a feature of a second 
layer higher than the first layer on the basis of the 

20 distribution analyzed by the analyzing means. 

According to still another aspect of the present 
invention, a pattern identification program for 
allowing a computer to identify a pattern of input data 
by hierarchically extracting features of the input data 

25 comprises a first feature extraction step of extracting 
a feature of a first layer, an analysis step of 
analyzing a distribution of a feature extraction result 
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in the first feature extraction step, and a second 
feature extraction step of extracting a feature of a 
second layer higher than the first layer on the basis 
of the distribution analyzed in the analysis step. 
5 According to still another aspect of the present 

invention, there is provided a pattern identification 
method of identifying a pattern of input data by 
hierarchically extracting features of the input data 
comprises a first feature extraction step of extracting 

10 a feature of a first layer, and a second feature 

extraction step of extracting a feature of a second 
layer higher than the first layer by one on the basis 
of a feature extraction result in the first layer and a 
feature extraction result in a layer other than the 

15 first layer. 

According to still another aspect of the present 
invention, a pattern identification apparatus for 
identifying a pattern of input data by hierarchically 
extracting features of the input data comprises first 

20 feature extraction means for extracting a feature of a 
first layer, and second feature extraction means for 
extracting a feature of a second layer higher than the 
first layer by one on the basis of a feature extraction 
result in the first layer and a feature extraction 

25 result in a layer other than the first layer. 

According to still another aspect of the present 
invention, a pattern identification program for causing 
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a computer to identify a pattern of input data by 
hierarchically extracting features of the input data 
comprises a first feature extraction step of extracting 
a feature of a first layer, and a second feature 
5 extraction step of extracting a feature of a second 
layer higher than the first layer by one on the basis 
of a feature extraction result in the first layer and a 
feature extraction result in a layer other than the 
first layer. 

10 Other features and advantages of the present 

invention will be apparent from the following 
description taken in conjunction with the accompanying 
drawings, in which like reference characters designate 
the same or similar parts throughout the figures 

15 thereof. 

BRIEF DESCRIPTION OF DRAWINGS 
The accompanying drawings, which are incorporated 
in and constitute a part of the specification, 
20 illustrate embodiments of the invention and, together 
with the description, serve to explain the principles 
of the invention. 

Fig. lA is a view showing the basic arrangement 
of a pattern identification apparatus according to the 
25 f ir s t embodiment ; 

Fig. IB is a view showing the basic arrangement 
of the pattern identification apparatus according to 
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the first embodiment; 

Fig. 2 is a view showing the functional 
arrangement of the pattern identification apparatus 
according to the first embodiment; 
5 Fig. 3 is a flowchart showing the flow of 

processing in the first embodiment; 

Fig. 4 is a view showing face images as an 
identification category in the first embodiment; 

Fig. 5 is a view showing four types of initial 
10 feature extraction results; 

Fig. 6 is a view showing the initial feature 
extraction results at positions . where local features to 
be extracted are present; 

Fig. 7 is a view showing the arrangement of a 
15 basic convolutional neural network; 

Fig. 8 is a view showing the functional 
arrangement of a pattern identification apparatus 
according to the second embodiment; 

Figs. 9A and 9B are flowcharts showing the flow 
20 of processing in the second embodiment; 

Fig. 10 is a view showing the functional 
arrangement of a pattern identification apparatus 
according to the third embodiment; 

Figs. IIA and IIB are flowcharts showing the flow 
25 of processing in the third embodiment; and 

Fig. 12 is a view showing the block configuration 
of a computer which implements the present invention. 
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Fig. 13 ±s a view showing the hierarchical 
structure according to the fourth embodiment; 

Fig, 14A Is a view for explaining an integrating 
process according to the fourth embodiment; and 
5 Fig. 14B is a view for explaining the integrating 

process according to the fourth embodiment. 

BEST MODE FOR CARRYING OUT THE INVENTION 
Preferred embodiments of the present invention 
10 will now be described in detail in accordance with the 
accompanying drawings . 
( First Embodiment ) 

As the first embodiment of the present invention, 
a method of identifying whether input two-dimensional 
15 image data is a certain specific category will be 
explained below. 

This embodiment assumes, as identification 
categories, face images as indicated by i to iv in 
Fig. 4 in each of which the center of a face is present 
20 in substantially the center of an input image, and a 

non-face image as indicated by v in Fig. 4 which is not 
a face image. A method of identifying whether input 
image data is the former or the latter of these two 
categories will be described below. 
25 In this embodiment, identification of whether 

input image data is a face image or not will be 
explained. However, the application of the present 
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invention is not limited to such Images. That is, the 
present invention is also applicable to other image 
patterns or to a case in which input data is voice 
data. In addition, to simplify the explanation, 
5 identification of whether input image data falls under 
a single category, i.e., a face, will be described 
below* However, the present invention is applicable 
not only to identification of a single category but 
also to identification of a plurality of categories . 

10 Figs. lA and IB illustrate the basic arrangements 

of a pattern identification apparatus. An outline of 
this pattern identification apparatus will be described 
below with reference to Figs . lA and IB . 

A data input unit 11 shown in Fig. lA inputs data 

15 as an object of pattern identification. A hierarchical 
feature extraction processor 12 hierarchically extracts 
features from the input data, and identifies the 
pattern of the input data. The hierarchical feature 
extraction unit 12 includes a primary feature 

20 extraction processor 121 for performing a primary 

feature extraction process , and a secondary feature 
extraction processor 122 for performing a secondary 
feature extraction process. An extraction result 
distribution analyzer 13 analyzes the distribution of 

25 features extracted by the primary feature extraction 
processor 121. 

In this pattern identification apparatus, the 
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data input unit 11 inputs data to be identified. The 
hierarchical feature extraction processor 12 performs a 
hierarchical feature extraction process for this input 
data. In this hierarchical extraction process, the 
5 primary feature extraction processor 121 hierarchically 
extracts a plurality of primary features from the input 
data. Then, the extraction result distribution 
analyzer 13 analyzes the distribution of at least one 
type of a primary feature extracted by the primary 

10 feature extraction processor 121. In addition, on the 
basis of the result of analysis , the second feature 
extraction processor 122 extracts secondary features. 

Fig. IB shows another basic arrangement of the 
pattern identification apparatus. An outline of this 

15 pattern identification apparatus will be explained 
below with reference to Fig. IB. 

Referring to Fig. IB, a data input unit 11 inputs 
data as an object of pattern identification. A 
hierarchical feature extraction processor 12 

20 hierarchically extracts features from the input data, 
and identifies the pattern of the input data. The 
hierarchical feature extraction unit 12 includes a 
primary feature extraction processor 121 for performing 
a primary feature extraction process, and a secondary 

25 feature extraction processor 122 for performing a 

secondary feature extraction process. An extraction 
result distribution analyzer 13 analyzes the 
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distribution of features extracted by the primary 
feature extraction processor 121. A category 
likelihood calculator 14 calculates the likelihood of 
each category of secondary features from the result of 
5 analysis by the extraction result distribution analyzer 
13. 

In this pattern identification apparatus, the 
data input unit 11 inputs data to be identified. The 
hierarchical feature extraction processor 12 performs a 

10 hierarchical feature extraction process for this input 
data. In this hierarchical extraction process, the 
primary feature extraction processor 121 hierarchically 
extracts a plurality of primary features from the input 
data. Then, the extraction result distribution 

15 analyzer 13 analyzes the distribution of at least one 
type of a primary feature extracted by the primary 
feature extraction processor 121. On the basis of the 
result of analysis by the extraction result 
distribution analyzer 13, the category likelihood 

20 calculator 14 calculates the likelihood of each 

category of secondary features to be extracted by the 
secondary feature extraction processor 122. The second 
feature extraction processor 122 extracts secondary 
features which belong to categories each having a 

25 calculated likelihood equal to or larger than a 
predetermined value . 

Fig. 2 shows the functional arrangement of the 
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pattern identification apparatus according to this 
embodiment. Fig. 3 shows the flow of processing in 
this embodiment. The processing in this embodiment 
will be described below with reference to Figs. 2 and 
5 3. Referring to Fig. 2, the solid- line arrows indicate 
the flows of actual signal data, and the broken -line 
arrow indicates the flow of instruction signals, such 
as operation instructions , rather than actual signal 
data. The same expression is used in Figs. 8 and 10 

10 (to be described later). 

First, in step S301, an image input unit 21 
inputs image data as an object of identification. 
Although this input image data is a grayscale image in 
this embodiment, an RGB color image may also be used. 

15 In step S302, an initial feature extractor 22 

extracts at least one initial feature, such as an edge 
in a specific direction, of the input image. In step 
S303, a local feature extractor 23 extracts local 
features, e.g., an edge line segment having a specific 

20 length and the end points of this edge line segment, by 
using the initial features extracted by the initial 
feature extractor 22. In step S304, a partial feature 
extractor 24 extracts partial features such as the eye 
and mouth by using the local features extracted by the 

25 local feature extractor 23. 

In step S305, a partial feature distribution 
determinator 25 analyzes the distribution, in the 
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Image, of the partial features extracted by the partial 
feature extractor 24. In step S306, in accordance with 
the analytical result , the partial feature distribution 
determinator 25 issues an activation instruction to a 
5 face extractor 26, and turns on flags of face 
extraction modules to be activated. 

The face extractor 26 is a processor which 
extracts the face by using the partial features 
extracted by the partial feature extractor 24. The 

10 face extractor 26 is made up of a plurality of modules 
each of which extracts the face in accordance with a 
specific size or direction, and only modules having 
received the activation instruction perform face 
extraction. In steps S307 to S309, face extraction 

15 modules having ON flags sequentially perform the face 
extraction process , and the flag of each face 
extraction module having executed face extraction is 
turned off. If there is no more face extraction module 
having an ON flag, the face extraction process is 

20 terminated. 

In steps S310 and S311, a detection result output 
unit 27 integrates the face extraction results from the 
face extraction modules , determines whether the input 
image is a face image or a non-face image, and outputs 

25 the determination result. 

Details of the processing performed by each 
processor on and after the initial feature extractor 22 
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for the Image data Input from the Image Input unit 21 
will be described below. 

Initial features extracted from the Input Image 
by the Initial feature extractor 22 are desirably 
5 constituent elements of features to be extracted by the 
local feature extractor 23 as a higher layer. In this 
embodiment, a filtering process Is simply performed In 
each position of an Input Image by using differential 
filters In a longitudinal direction, a lateral 

10 direction, an oblique direction toward the upper right 
corner, and an oblique direction toward the upper left 
corner, thereby extracting four types of features such 
as a vertical edge, horizontal edge, and oblique edges. 
Although the filtering process as described above Is 

15 performed In this embodiment. It Is also possible to 
extract features by performing template matching In 
each position of an Input Image by using a prepared 
template Image Indicating Initial features. 

The extracted feature Is held as Information such 

20 as the type of the feature, the position In the Image, 
the likelihood of the feature to be extracted, and the 
feature detection level. In this embodiment, features 
Indicated by a to d in Fig. 5 are extracted from the 
input image (1 in Fig. 4) in this stage. Referring to 

25 Fig. 5, a, b, c, and d indicate the extraction results 
of a vertical edge, a horizontal edge, a rightward 
oblique edge, and a leftward oblique edge. 
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In Fig. 5, a position where the result of 
filtering in each position of the image is 0 is gray, 
positive values are represented by high luminance 
values, and negative values are represented by low 
5 luminance values. That is, in the images shown in 

Fig. 5, in a position having a high luminance value, an 
edge in a direction corresponding to the type of each 
filter is extracted. In a position having a low 
luminance value, an edge in a direction opposite to the 
10 direction corresponding to the type of each filter is 
present. A gray portion having an intermediate 
luminance value indicates a position where no edge is 
extracted. 

Since differential filters are used to extract 
15 features, the absolute values of values obtained by 

filtering exhibit sharpness of edges. That is, in each 
position of the input image, the larger the change in 
luminance value in a direction corresponding to the 
type of filter, the larger or smaller the luminance 
20 value of the position. 

Similar to the features extracted by the initial 
feature extractor 22, the local features extracted by 
the local feature extractor 23 by using the initial 
feature extraction results obtained by the initial 
25 feature extractor 22 are desirably constituent elements 
of features to be extracted by the partial feature 
extractor 24 as a higher layer. 



P204-0456WO 

- 15 - 

In this embodiment, the partial feature extractor 
24 extracts the eye and mouth. Therefore, the local 
feature extractor 23 extracts features as indicated by 
portions surrounded by circles in 1-a to 4-d of Fig. 6. 
5 That is, the local feature extractor 23 extracts two 
types of features, i.e., the left and right end points 
as the end points of an edge line segment corresponding 
to, e.g., the corners of the eye or the two ends of the 
mouth. The local feature extractor 23 also extracts 

10 two types of edge line segments having specific 

lengths, i.e., a feature corresponding to the upper 
portion of the eye or the upper portion of the lips , 
and a feature corresponding to the lower portion of the 
eye or the lower portion of the lips . 

15 1-a to 1-d in Fig. 6 indicate the initial feature 

extraction results in a position where the left end 
point (the inner corner of the left eye in Fig. 6) is 
present. That is, 1-a, 1-b, 1-c, and 1-d indicate the 
extraction results of the vertical edge, the horizontal 

20 edge, the rightward oblique edge, and the leftward 
oblique edge, respectively. 2-a, 2-b, 2-c, and 2-d 
indicate the extraction results of the initial features 
(vertical, horizontal, rightward oblique, and leftward 
oblique edges, respectively) in a position where the 

25 right end point (the end point of the mouth in Fig. 6) 
is present. 3-a, 3-b, 3-c, and 3-d indicate the 
extraction results of the initial features (vertical. 
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horizontal, rightward oblique, and leftward oblique 
edges, respectively) in a position where the upper 
portion of the eye or the upper portion of the lips 
(the upper portion of the right eye in Fig. 6) is 
5 present. 4 -a, 4-b, 4-c, and 4-d indicate the 

extraction results of the initial features (vertical, 
horizontal, rightward oblique, and leftward oblique 
edges, respectively) in a position where the lower 
portion of the eye or the lower portion of the lips 

10 (the lower portion of the lips in Fig. 6) is present. 

In this embodiment, a method of extracting each 
feature is as follows. First, a two-dimensional mask 
unique to each feature extracted by the initial feature 
extractor 22 is prepared. Then, in each position of 

15 the feature extraction results as indicated by a to d 

in Fig. 5, a filtering process (convolution arithmetic) 
is performed using the two-dimensional mask unique to a 
feature to be extracted. Each feature is extracted by 
integrating the results of filtering performed for the 

20 individual initial feature extraction results. 

The prepared unique two-dimensional mask 
corresponds to the distribution (1-a to 1-d in Fig. 6) 
of the initial feature extraction results in a position 
where the feature to be extracted (e.g., the feature 

25 such as the left end point in Fig. 6) is present. That 
is, the two-dimensional mask is so set that the value 
obtained by filtering is large if the initial feature 
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extraction result distribution is unique around the 
position where the feature to be extracted is present. 

The two-dimensional mask is set as follows. 
First, a plurality of test patterns are simply given, 
and the value of each element of the two-dimensional 
mask is so adjusted that the result of filtering has a 
large value if the given test pattern is a feature to 
be extracted. Also, the value of each element of the 
two-dimensional mask is so adjusted that the result of 
filtering has a small value if the given test pattern 
is not a feature to be extracted. It is also possible 
to set the value of each element of the two-dimensional 
mask by using knowledge obtained in advance. 

As in the initial feature extractor 22, each 
feature extracted by the processing as described above 
is held as information such as the type of the 
extracted feature, the position in the image, the 
likelihood of the feature to be extracted, and the 
feature detection level. In this embodiment, for each 
of the four types of features, i.e., the two types of 
end points and the edge line segments having the two 
types of specific lengths, filtering is performed for 
each initial feature by using the position where the 
feature is extracted and the two-dimensional mask 
unique to the feature. The results of filtering are 
integrated and recorded as the likelihood of the 
feature . 
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The processing performed by the partial feature 
extractor 24 Is analogous to that performed by the 
local feature extractor 23; partial features are 
extracted from, a plurality of local feature extraction 
5 results obtained by the local feature extractor 23 as 
the feature extraction results of a lower layer. The 
partial features to be extracted are also desirably 
features to be extracted by the face extractor 26 as a 
higher layer. I.e., constituent elements of the face In 

10 this embodiment. 

In this embodiment as described above, the 
partial feature extractor 24 extracts, e.g., the eye 
and mouth. The process of extraction Is the same as 
the extraction method of the local feature extractor 

15 23; features need only be extracted by filtering using 
specific two-dimensional masks. Alternatively, it is 
also possible to simply extract the eye and mouth in 
accordance with whether, in the feature extraction 
results obtained by the local feature extractor 23, 

20 features having likelihoods of a predetermined value or 
more have a specific spatial positional relationship. 

Each of the eye and mouth extracted as described 
above is also held as Information such as the type of 
the extracted feature, the position in the image, the 

25 likelihood of the feature to be extracted, ad the 

feature amount. In this embodiment, the results of 
filtering performed for the local feature extraction 
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results by using the two-dimensional masks unique to 
the eye and mouth are integrated in each position of 
the image, and held as the likelihood in the position 
of each partial feature. 
5 The partial feature distribution determinator 25 

performs simple distribution analysis on the feature 
extraction results obtained by the partial feature 
extractor 24, In addition, on the basis of the 
analytical result, the partial feature distribution 

10 deteirminator 25 gives an activation instruction to one 
or a plurality of predetermined face extraction modules 
of the face extractor 26 . 

Unlike In the processes performed from the 
initial feature extractor 22 to the partial feature 

15 extractor 24, the analysis herein mentioned extracts 
necessary conditions for each predetermined face 
extraction module to which the activation instruction 
is to be given. For example, in this embodiment, this 
analysis determines whether the eye Is extracted near 

20 predetermined coordinates in the Input image by the 
processing of the partial feature extractor 24. The 
analysis also determines whether the barycentrlc 
position of the mouth extraction results obtained by 
the processing of the partial feature extractor 24 is 

25 in the vicinity of the predetermined coordinates. 
Alternatively, the analysis determines whether the 
total of the likelihoods of the eye as the processing 
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results of the partial feature extractor 24 is equal to 
or larger than a predetermined value. 

These analyses as described above can be 
performed by presetting conditions corresponding to 
5 modules which make up the face extractor 26 and perform 
face extraction corresponding to a plurality of 
variances . The variances herein mentioned are changes 
in features obtained by, e.g., affine transformation 
such as rotational transformation and size 

10 transformation, and transformation corresponding to, 
e.g., a case in which the face is turned to the side. 
For example, one necessary condition set for a face 
extraction module corresponding to a clockwise planar 
rotational variance is that the barycenter of the mouth 

15 extraction results is present off to the lower left of 
the center of the image, and the barycenter of the eye 
extraction results is off to the upper right of the 
barycenter of the mouth extraction results. 

Several analyses as described above are 

20 performed, and an activation instruction is issued to 
predetermined face extraction modules meeting the 
conditions of analysis. The barycenters and the total 
of likelihoods may also be analyzed within a 
predetermined range, e.g., a position where the eye is 

25 expected to exist. It is also possible to compare the 
totals of likelihoods of two or more features. Since 
modules for feature extraction are thus selected by the 
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analyses having the simple necessary conditions as 
described above, the processing cost can be reduced, 
and Identification errors can also be reduced. 

In the face extractor 26, only predetermined face 
5 extraction modules having received the activation 
Instruction from the partial feature distribution 
extractor 25 perform a feature extraction process 
similar to that of the partial feature extractor 24 by 
using the extraction results of the eye and mouth 

10 obtained by the partial feature extractor 24. Examples 
of prepared modules corresponding to specific variances 
are a module specialized to a variance in size (11 in 
Fig. 4), a module specialized to a variance caused by 
planar rotation (ill in Fig. 4), a module specialized 

15 to a variance caused by a horizontal shake of the face 
(Iv in Fig. 4), and a module specialized to a variance 
caused by a vertical shake of the face. 

In this embodiment, a specific two-dimensional 
mask is prepared for each module corresponding to the 

20 variance as described above, and only a module having 
received the activation instruction performs filtering 
by using the specific two-dimensional mask. The 
two-dimensional mask is set in the same manner as 
explained for the local feature extractor 23; the 

25 two -dimensional mask is set by giving, as a test 
pattern, a face having a specific variance 
corresponding to a module so that the module is 
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specialized to the corresponding variance. 

This face extraction Is performed by using the 
face around the center of the Image as a target - 
Therefore, unlike the feature extraction processes up 
5 to the partial feature extractor 24, filtering need not 
be performed In each position of the Image but need 
only be performed within the face extraction range of 
the Image . 

The detection result output unit 27 performs 

10 final Input Image category classification from the 
results of filtering performed by those modules 
corresponding to the variances, which have received the 
activation Instruction and performed the face 
extraction process. In this embodiment, the detection 

15 result output unit 27 simply determines whether the 

output value of each activated face extraction module 
has exceeded a threshold value set for the module. If 
the output value of at least one module has exceeded 
the threshold value, the detection result output unit 

20 27 determines that the Input Image Is a face Image; If 
not, the detection result output unit 27 determines 
that the Input Image Is a non-face image. 

This determination is not limited to the above 
method. For example, final determination may also be 

25 performed by Integrating the output values of the 

activated modules. More specifically, identification 
errors can be reduced by suppressing the outputs of 
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modules having conflicting variances. For example, it 
is possible to subtract, from the output value of a 
module corresponding to a clockwise planar rotational 
variance, the output value of a module corresponding to 
5 a counterclockwise planar rotational variance, as an 
opposite variance category, after a predetermined 
weight is added to the latter output value. 

Also, the threshold values for identification can 
be increased by promoting the outputs of modules 

10 corresponding to similar variances. As a consequence, 
identification errors can be reduced. For example, it 
is possible to add, to the output module corresponding 
to a face having a specific size, the output value of a 
module corresponding to a face having a size slightly 

15 larger than the specific size, which is a similar 
variance category, after a predetermined weight is 
added to the latter output value. 

It is also possible to perform weighted addition 
or a simple arithmetic mean operation for the output 

20 values of two or more modules corresponding to similar 
categories as described above, and newly set the 
obtained value as an output value of a virtual feature 
extraction module corresponding to an intermediate 
variance between the categories. Consequently, 

25 high-accuracy identification can be performed with a 

low processing cost without any identification errors. 
The above first embodiment is explained as an 
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example of a method of identifying whether input 
two-dimensional image data is a certain specific 
category, wherein a face image in which the center of a 
face is present in substantially the center of the 
5 input image and a non-face image which is an image 

other than the face image are assumed as identification 
categories, and whether the input image data is one of 
these two categories is identified. 
(Second Embodiment) 

10 In the second embodiment, a method of detecting 

the position of a face in input two-dimensional image 
data will be described as a modification of the above 
first embodiment. In this embodiment, a process of 
detecting the face in an image will be explained below. 

15 However, as in the first embodiment, the application of 
the present invention is not limited to the process of 
detecting the face in an image. That is, the present 
invention is also applicable to a process of detecting 
another image pattern or a predetermined pattern from 

20 input voice data. In addition, the present invention 
can be applied to detection of objects of a plurality 
of categories . 

In this embodiment, as a method of detecting, 
with robustness against variances, a specific pattern 

25 from two-dimensional image data by hierarchical feature 
extraction, the basic configuration of a convolutional 
neural network (to be referred to as CNN hereinafter) 
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±s changed. Fig. 7 shows the basic CNN arrangement. 
The basic processing of the CNN will be explained below 
with reference to Fig. 7. In Fig. 7, the processing 
flows to the right from the left end as an input end. 
5 In Fig. 7, reference numeral 71 denotes a pixel 

value distribution corresponding to, e.g., the 
Itiminance value of an input image. Reference numerals 
72, 74, 76, and 78 denote feature detecting layers. 
Reference numerals L7-21, L7-22, L7-23, L7-24, L7-41, 

10 L7-42, L7-43, L7~44, L7-61, L7-62, and L7-81 in these 

layers denote feature detecting cell planes . Reference 
niimerals 73, 75, and 77 denote feature integrating 
layers. Reference numerals L7-31, L7-32, L7-33, L7-34, 
L7-51, L7-52, L7-53, L7-54, L7-71, and L7-72 in these 

15 layers denote feature integrating cell planes. 

In the CNN, two layers, i.e., a feature detecting 
layer and feature integrating layer are combined as one 
set, and these layers are hierarchically arranged. 
Each feature detecting cell plane in the feature 

20 detectingf layer has a feature detecting neuron which 
detects a certain specific feature. Each feature 
detecting neuron is connected to the feature detection 
result of a layer in the preceding stage by a weight 
distribution unique to each feature detecting cell 

25 plane, within a local range corresponding to the 

position of the feature detecting neuron. For example, 
a feature detecting neuron in the feature detecting 
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layer 74 is connected to the feature detection results 
from L7-31 to L7-34, and a feature detecting neuron in 
the feature detecting layer 72 is connected to the 
input image 71, by a weight distribution unique to each 
5 feature detecting cell plane (e.g., L7-21). 

This weight corresponds to a differential filter 
for extracting an edge or a two-dimensional mask for 
extracting a specific feature described in the first 
embodiment. As described in the first embodiment, this 

10 weight can be set by using knowledge obtained in 

advance, or by learning which gives a plurality of test 
patterns. It is also possible to set the weight by 
using a known neural network learning method, e.g., 
learning using the back propagation method, or 

15 self -organizing learning using Hebb Learning Law. 

Each feature detecting neuron is added, with a 
predetermined weight, to the feature detection result 
of a feature cell plane as the destination of 
connection. If the neuron is in the feature detecting 

20 layer 72, it is added, with a predetermined weight, to 
the luminance value or the like of an input image. In 
addition, the value of the operation result is 
transformed by a nonlinear function such as a 
hyperbolic tangent function, and the obtained value is 

25 used as the output value of the feature detecting 
neuron, thereby detecting a feature. 

For example, if L7-21 is a cell plane for 
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detecting a vertical edge, each feature neuron in L7-21 
performs weighted addition corresponding to a 
differential filter, with respect to the luminance 
value of an input image. In this manner, in a position 
5 of the input image where a vertical edge is present , 
the value of the operation result performed by the 
feature detecting neurons in L7-21 increases, and this 
increases the output value* That is, a feature is 
detected. 

10 This similarly applies to other feature detecting 

cell planes; in a position of each feature detecting 
cell plane where a specific feature is detected, a 
feature detecting neuron outputs a large value. 
Although the output value is generally calculated by 

15 nonlinear transformation as described above, the 
calculation method is not limited to this 
transformation . 

Each feature integrating cell plane (e.g., L7-31) 
in a feature integrating layer (e.g., 73) has a feature 

20 integrating neuron which is connected to one feature 
detecting cell plane (e.g., L7-21) of a feature 
detecting layer (e.g., 72) as a layer in the preceding 
stage, and connected within a local range to the 
feature detecting results in the preceding stage to 

25 diffuse (integrate) the feature detecting results. 

Each feature integrating neuron basically performs the 
same arithmetic as the feature detecting neuron 
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described above. The characteristic of this feature 
integrating neuron is that a weight distribution 
corresponding to a specific two-dimensional mask is a 
Gaussian filter or a low-pass filter. 
5 The network structure of the CNN gradually 

detects high-order features from initial features by- 
using the hierarchical feature detecting and 
integrating processes as described above, and finally 
categorizes the input. Specific image detection can be 

10 performed by detecting high-order features from an 
input image by the above processing. The CNN is 
characterized in that identification which is robust 
against variances having various patterns can be 
performed by hierarchical feature extraction and by 

15 diffusion by the feature integrating layers. 

This embodiment will be described below by taking 
the CNN described above as the basic hierarchical 
feature extraction process configuration. Fig. 8 shows 
the arrangement of processors according to this 

20 embodiment. Fig. 9 shows the flow of processing 

according to this embodiment. The processing of this 
embodiment will be explained below with reference to 
Figs. 8 and 9. 

Referring to Fig. 8, an image input unit 801, 

25 initial feature extractor 802, local feature extractor 
803, and partial feature extractor 804 are similar to 
the image input unit 21, initial feature extractor 22, 
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local feature extractor 23, and partial feature 
extractor 24, respectively, of the first embodiment. 
Also, processes in steps S901 to S904 are the same as 
in steps S301 to S304 of Fig. 3. 
5 In this embodiment, an RGB color image is used in 

the image input unit 801, and a grayscale image 
obtained by converting this RGB color image is input to 
the initial feature extractor 802 in the next layer. 
In addition, processing performed by the CNN described 

10 above is used in feature extraction, and each feature 
extractor integrates a feature detected in a feature 
detecting layer and a feature detected in a feature 
integrating layer. The types of features extracted by 
the local feature extractor 803 and partial feature 

15 extractor 804 are analogous to those of the first 

embodiment. Also, similar to the method of setting a 
unique two-dimensional mask explained in the first 
embodiment, a weight distribution unique to each 
feature detecting cell plane for detecting a feature is 

20 set by learning by inputting a plurality of test 
patterns . 

In this embodiment, features to be extracted by 
the initial feature extractor 802 are not limited 
beforehand. Instead, the back propagation method is 
25 used when features detected by the local feature 

extractor 803 are learned, thereby learning a weight 
distribution unique to each feature detecting cell 
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plane for detecting a local feature, and automatically 
setting a weight distribution unique to each feature 
cell plane for detecting an initial feature. In this 
manner, a weight distribution coupled with the input 
5 image 71 can be automatically set so that the initial 
feature extractor 802 extracts initial features which 
make up a local feature detected by the local feature 
extractor 803, and are necessary to detect the local 
feature. 

10 In step S905, a first face extractor 805 performs 

the same processing as the above-mentioned feature 
extraction method for the eye and mouth extraction 
results obtained by the partial feature extractor 804, 
thereby extracting the face in the image. 

15 If the output value from the first face extractor 

805 exceeds a predetermined threshold value, a face 
candidate existence de terminator 806 determines that a 
candidate for the face exists (step S906). Then, the 
face candidate existence determinator 806 sets the 

20 number of face candidates in Count (step S907), 
sequentially outputs the coordinates of the face 
candidate existing positions found to have the face 
candidates, and issues an activation instruction to a 
skin color region extractor 807 and partial feature 

25 distribution determinator 808 (step S908). 

When receiving the activation instruction from 
the face candidate existence determinator 806, the skin 
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color region extractor 807 extracts a skin color region 
from the input image within a range based on the face 
candidate existence position coordinates (step S909). 
The partial feature distribution determinator 808 
5 determines the distribution of the partial feature 

extraction results within the range based on the face 
candidate existence position coordinates (step S910). 
In addition, as in the first embodiment, the partial 
feature distribution determinator 808 turns oh the 
10 flags of face extraction modules to be activated (step 
S911) . 

The partial feature distribution determinator 808 
of this embodiment differs from the partial feature 
distribution determinator 25 of the first embodiment in 

15 that the partial feature distribution determinator 808 
uses not only the feature extraction results from the 
partial feature extractor 804 but also the skin color 
region extraction results from the skin color region 
extractor 807. The partial feature distribution 

20 determinator 808 performs simple distribution analysis 
on these feature extraction results, and includes face 
extraction modules corresponding to a plurality of 
variances. The partial feature distribution 
determinator 808 is also a processor which issues an 

25 activation instruction to a second face extractor 809. 
Note that one face extraction module in this embodiment 
corresponds to one feature detecting cell plane in the 
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CNN. 

As in the first embodiment, the second face 
extractor 809 causes face extraction modules 
corresponding to variances to perform face extraction. 
5 That is, the second face extractor 809 sequentially 
causes face extraction modules having ON flags to 
perform face extraction at the face candidate existence 
position coordinates, and turns off the flags of the 
face extraction modules having executed face extraction 

10 (steps S911 to S914). 

Unlike in the first embodiment, the face 
extraction process in this embodiment extracts a face 
corresponding to specific variances by using not only 
the eye and mouth feature extraction results obtained 

15 by the partial feature extractor 804, but also the 

feature extraction results corresponding to the upper 
portion of the eye or the upper portion of the lips 
obtained by the local feature extractor 803, and the 
skin color region extraction results obtained by the 

20 skin color region extractor 807. 

On the basis of the face extraction results from 
the second face extractor 809, a detection result 
output unit 810 outputs a result indicating the 
position of the face in the input image. That is, the 

25 detection result output unit 810 integrates the output 
results from the individual modules (step S914), and 
outputs a detection result in the face candidate 
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existence position (S915). The flow then loops to 
detection in the next face candidate existence position 
(steps S917 and S918). 

Details of the processes perfozmed by the 
5 processors on and after the first face extractor 805 in 
this embodiment will be explained below. 

The face extraction process performed by the 
first face extractor 805 is the same as the feature 
extraction processes performed by the local feature 

10 extractor 803 and partial feature extractor 804. This 
face extraction is made up of only one module, although 
the face extractor 26 of the first embodiment has a 
plurality of face extraction modules corresponding to 
variances. Also, unlike in the first embodiment, the 

15 position of a face in an image is detected in this 

embodiment. Therefore, face extraction is performed 
not only near the center of the image but also in 
different positions of the image. 

A unique weight distribution of each face 

20 detecting neuron, which is used in extraction and 
connected to the partial feature extraction result 
obtained by the partial feature extractor 804 is set on 
the basis of learning by which faces having various 
variances (e.g., faces having various variances as 

25 indicated by i to iv in Fig. 4) are given as test data. 
This learning increases the possibility that a non-face 
portion is regarded as a face, i.e., decreases the 
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accuracy. However, faces having various variances can 
be extracted by a single module. This processor 
detects features by using the learned weight 
distribution as described above, and the feature 
5 integrating layer integrates the results. 

For the results of the face extraction process 
performed by the first face extractor 805, the face 
candidate existence determinator 806 determines a 
portion where the output is equal to or larger than a 

10 predetermined threshold value. The face candidate 
existence determinator 806 determines that a face 
candidate exists in the determined position, and issues 
an activation instruction to the skin color region 
extractor 807 and partial feature distribution 

15 determinator 808 to perform processing within the range 
in which this candidate exists. 

Upon receiving the activation instruction from 
the face candidate existence determinator 806, the skin 
color region extractor 807 extracts a skin color region 

20 near the range within which the face candidate exists. 
In this embodiment, in a region in which a skin color 
region is extracted, an RGB color input image is 
converted into an HSV colorimetric system, and only 
pixels within the range of a specific hue (H) are 

25 extracted as a skin color region. A method of 

extracting a skin color region is not limited to this 
method, so another generally kiiown method may also be 
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used. For example. It is also possible to extract a 
iskln color region by using saturation (S) or luminance 
(V). In addition, although a skin color region is 
extracted in this embodiment , a hair region or the like 
5 may also be extracted. 

The partial feature distribution determinator 808 
performs the same processing as the partial feature 
distribution determinator 25 of the first embodiment. 
In this embodiment, the partial feature distribution 

10 determinator 806 receives the activation instruction 
from the face candidate existence determinator 806, 
similar to the skin color region extractor 807, and 
analyzes the distribution of predetermined feature 
extraction results near the range within which the face 

15 candidate exists. In accordance with the result of the 
analysis, the partial feature distribution determinator 
808 gives an activation instruction to the second face 
extractor 809 made up of face extraction modules 
corresponding to a plurality of specific variances, so 

20 as to select predetermined face extraction modules and 
perform face extraction in the face candidate existence 
position. 

The feature extraction results analyzed by the 
partial feature distribution determinator 808 are the 
25 eye and mouth extraction results obtained by the 
partial feature extractor 804, and the skin color 
region extraction result obtained by the skin color 
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region extractor 807. This analysis is the same as in 
the first embodiment; each module forming the second 
face extractor 809 and corresponding to a variance 
extracts a necessary condition to be met if a face 
5 exists. 

Since this embodiment uses the skin color region 
extraction result unlike in the first embodiment, 
several examples of the analysis for this result will 
be explained below. The simplest exeunple is the 

10 analysis of the area of an extracted skin color region. 
It is also possible to analyze the aspect ratio of an 
extracted skin color region, or analyze the relative 
positional relationship between the barycenters of skin 
color regions in the upper half and lower half of a 

15 region found to have a face candidate. 

The first example serves as one necessary 
condition of a face extraction module corresponding to 
a specific size in accordance with the area. The 
second example is one necessary condition of a module 

20 corresponding to a horizontal shake or vertical shake 
of the face. The third example can be set as one 
necessary condition of a module corresponding to planar 
rotation of the face. It is also possible, by using 
the partial feature extraction results obtained by the 

25 partial feature extractor 804, to compare the area of a 
region from which the eye is extracted with the area of 
a skin color region, compare the area of a region from 
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Which the eye Is not extracted with the area of the 
skin color region, or compare the area of the region 
from which the eye is not extracted with the area of a 
non- skin-color region. 
5 Even the analysis of the area or the like as 

described above may also be performed only in a 
specific region as described in the first embodiment. 
For example, the area of ' a non- skin-color region can be 
analyzed in a region which is presumably a hair 
10 position. A more accurate activation instruction can 
be issued by adding this analysis to the analysis of 
the eye and mouth extraction results as in the first 
embodiment . 

The second face extractor 809 is a processor 
15 similar to the face extractor 26 of the first 

embodiment, and includes a plurality of face extraction 
modules corresponding to specific variances. In this 
embodiment, unlike in the first embodiment, face 
extraction is performed in the face candidate existence 
20 position by using not only the eye and mouth extraction 
results obtained by the partial feature extractor 804, 
but also the skin color extraction result obtained by 
the skin color region extractor 807, the extraction 
results of faces having various variances obtained by 
25 the first face extractor 805, and the feature 

extraction result, among other features extracted by 
the local feature extractor 803, which corresponds to 
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the upper portion of the eye or the upper portion of 
the lips. 

The accuracy of feature extraction can be 
Increased by thus subsidiarily using, e.g., the feature 
5 extraction result (in this embodiment, the first face 
extraction result) in the same layer, which is a 
feature on the same level, the feature extraction 
result (in this embodiment, the skin color region 
extraction result) externally inserted into the 

10 framework of hierarchical feature extraction, the 
feature extraction result (in this embodiment, the 
feature extraction result corresponding to the upper 
portion of the eye or the upper portion of the lips) in 
a layer before the immediately preceding layer, and the 

15 feature extraction result in a layer in the subsequent 
stage (to be explained in the third embodiment 
described later) . Although this processing increases 
the processing cost, the increase in processing cost 
can be minimized because only a module having received 

20 the activation instruction from the partial feature 
distribution determinator 808 performs the feature 
extraction process of the second face extractor 809 
only in a position where a face candidate exists. 
The detection result output unit 810 is a 

25 processor similar to the detection result output unit 
27 of the first embodiment. That is, from the results 
of feature extraction performed by those which are 
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activated by the activation instruction from the 
partial feature determinator 808, of the face 
extraction modules forming the second face extractor 
809 and corresponding to a plurality of variances, the 
5 detection result output unit 810 determines a position 
where the face exists in an image, and outputs the 
determination result. As explained in the first 
embodiment, the detection accuracy can be increased by 
integrating the outputs from a plurality of modules. 

10 In the second embodiment as described above, an 

example of detection of the face existence position in 
the method of detecting a certain specific object in an 
image of input two-dimensional image data is explained. 
(Third Embodiment) 

15 The third embodiment of the present invention is 

a modification of the second embodiment. As in the 
second embodiment, this embodiment performs a process 
of detecting the position of a face in an image. 
However, this embodiment is also applicable to another 

20 image pattern or voice data. In addition, the 

embodiment can be applied to detection of objects of a 
plurality of categories . 

Fig. 10 shows the arrangement of processors of 
this embodiment. Fig. 11 shows the flow of processing 

25 of this embodiment. The basic process configuration of 
this embodiment is the same as explained in the second 
embodiment. The processing of this embodiment will be 
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explained below with reference to Fig. 10. 

Processes (steps SllOl to S1109) performed by 
components from an Image Input unit 1001 to a skin 
color region extractor 1007 shown In Fig. 10 are 
5 exactly the same as steps S901 to S909 In the second 
embodiment, so an explanation thereof will be omitted. 

A partial feature distribution determlnator 1008 
also performs the same processing as the partial 
feature distribution determlnator 808 In the second 

10 embodiment. However, the partial feature distribution 
determlnator 1008 gives an activation Instruction to 
face extraction modules corresponding to a plurality of 
variances in a second face extractor 1009 so as to 
perform a face extraction process in a face candidate 

15 existence position, in accordance with the analytical 
result of the distribution of feature extraction 
results, and also gives an activation instruction to a 
second partial feature extractor 1011 made up of 
partial feature extraction modules corresponding to a 

20 plurality of variances. That is, the partial feature 
distribution determlnator 1008 determines the 
distribution of partial feature extraction results 
within a range based on the face candidate existence 
position coordinates (step SlllO), and turns on the 

25 flags of face extraction modules to be activated (step 
Sllll). 

The second partial feature extractor 1011 
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Includes a plurality of modules for extracting partial 
features corresponding to specific variances. Upon 
receiving the activation instruction from the partial 
feature distribution determinator 1008, a module in the 
5 second partial feature extractor 1011 re-extracts a 

partial feature only in a specific position determined 
by the face candidate existence position. That is, a 
partial feature extraction module corresponding a face 
extraction module having an ON flag performs a partial 

10 feature extraction process in a position determined by 
the face candidate existence position coordinates 
(steps S1113 and S1114). 

The second face extractor 1009 is a processor 
substantially the same as the second face extractor 809 

15 of the second embodiment. However, if the second 
partial feature extractor 1011 re-extracts partial 
features corresponding to the activated face extraction 
modules, the second face extractor 1009 performs face 
extraction by using the features extracted by a partial 

20 feature extractor 1004. That is, the second face 

extractor 1009 performs face extraction in the face 
candidate existence position by using a face extraction 
module having an ON flag, and turns off the flag of the 
face extraction module having executed face extraction 

25 (steps S1115 and S1116). 

A detection result output unit 1010 is exactly 
the same as the detection result output unit 810 of the 
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second embodiment, and steps S1117 to S1120 are also 
exactly the same as steps S915 to S918 of the second 
embodiment, so an explanation thereof will be omitted. 
Details of the processes in the partial feature 
5 distribution determinator 1008 , second partial feature 
extractor 1011, and second face extractor 1009 of this 
embodiment will be described below. 

As described above, the partial feature 
distribution determinator 1008 is the saune as the 

10 second embodiment in the process of analyzing the 

distribution of partial feature extraction results. In 
the second embodiment, an activation instruction is 
issued to modules which perform face extraction 
corresponding to a plurality of variances. However, 

15 the partial feature distribution determinator 1008 also 
issues an activation instruction to the second partial 
feature extractor 1011 which extracts partial features 
corresponding to the variances of the face extraction 
modules to which the activation instruction is issued. 

20 More specifically, when issuing an activation 

instruction to a face extraction module corresponding 
to, e.g., a clockwise planar rotational variance, the 
partial feature distribution determinator 1008 
simultaneously issues an activation instruction to a 

25 partial feature extraction module corresponding to the 
Scune clockwise planar rotational variance. 

The second partial feature extractor 1011 
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Includes a plurality of modules which extract partial 
features corresponding to a plurality of variances. In 
the second partial feature extractor 1011, partial 
feature extraction modules corresponding to modules 
5 which have received an activation instruction from the 
partial feature distribution determinator 1008 and 
perform face extraction corresponding to a plurality of 
variances are activated to extract partial features 
only within a specific range determined by the face 

10 candidate existence position obtained as the result of 
the face candidate existence determinator 1006. The 
method of feature extraction is the Scune as explained 
in the second embodiment . 

Each partial feature extraction module basically 

15 corresponds to each of the face extraction modules 
forming the second face extractor 1009 and 
corresponding to a plurality of variances. However, 
this correspondence need not be one-to-one 
correspondence. For exaunple, a partial feature 

20 extraction module corresponding a face extraction 

module for a full face may also be omitted. In this 
case, if an activation instruction is issued to this 
face extraction module for a full face, the second 
partial feature extractor 1011 need not perform any 

25 processing. 

Furthermore, one partial feature extraction 
module may also correspond to a plurality of types of 
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face extraction modules. For example, a face 
extraction module corresponding to a 15® clockwise 
planar rotational variance and a face extraction module 
corresponding to a 30° clockwise planar rotational 
5 variance can be related to a partial feature extraction 
module which singly performs extraction including these 
two variances. 

As described above, a feedback mechanism which 
controls the operation of feature extraction modules in 

10 a lower layer on the basis of the feature extraction 

result output from a higher layer is introduced. That 
is, the accuracy of feature extraction can be further 
increased by re-extracting low-order features by 
partial feature extraction modules corresponding to 

15 face extraction modules activated in second face 

extraction and corresponding to specific variances . 
Although this re-extraction of features increases the 
processing cost, the increase in processing cost can be 
minimized because a module having received an 

20 activation instruction performs processing only in a 
specific position. 

In this embodiment, this processor performs only 
eye extraction corresponding to variances , without any 
mouth extraction. To further increase the feature 

25 extraction accuracy, mouth extraction corresponding to 
variances may also be performed, or features other than 
those extracted by the partial feature extractor 1004 
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may also be extracted. 

Furthermore, ±n this feature extraction, eye 
extraction Is performed by using . the partial feature 
extraction results of, e.g.^ the eye and mouth obtained 
5 by the partial feature extractor 1004, and the first 
face extraction results obtained by the first face 
extractor 1005, In addition to the local feature 
extraction results obtained by the local feature 
extractor 1003 . As already described In the second 

10 embodiment, the accuracy of the feature extraction 
process can be Increased by subsidiarily using the 
feature extraction result In the same layer which Is a 
feature on the same level, and the feature extraction 
result in a higher layer which is a feature on a higher 

15 level. 

The second face extractor 1009 basically performs 
the same processing as the second face extractor 809 of 
the second embodiment. The difference from the second 
face extractor 809 of the second embodiment is that if, 

20 in the second partial feature extractor 1011, partial 
feature extraction corresponding to variances is 
performed in accordance with activated face extraction 
modules, face extraction is performed by using the 
results of this partial feature extraction 

25 corresponding to the variances performed in the second 
partial feature extractor 1011, rather than the partial 
feature extraction results obtained by the partial 
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feature extractor 1004 . 

In this embodiment, the second partial feature 
extractor 1011 performs only eye extraction, so mouth 
extraction is performed using the extraction results 
5 from the partial feature extractor 1004 . As explained 
above in relation to the second partial feature 
extractor 1011, if, for example, there is no partial 
feature extraction module corresponding to a face 
extraction module for a full face, the second partial 

10 feature extractor 1011 does not re-extract any features 
when an activation instruction is issued to this face 
extraction module for a full face. 

In a case like this , the feature extraction 
results from the partial feature extractor 1004 can be 

15 directly used. In this embodiment, when partial 
feature extraction corresponding to variances is 
performed in relation to activated face extraction 
modules, the eye extraction result obtained by the 
partial feature extractor 1004 is not used. To further 

20 increase the accuracy, however, this feature extraction 
result may also be subsidiarily used. 

In the third embodiment as a modification of the 
second embodiment as described above, an example of 
detection of the position of a face in a method of 

25 detecting a certain specific object in an image of 
input two-dimensional image data is explained. 
(Fourth Embodiment) 
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In the fourth embodiment of the present 
invention, the connecting form in a hierarchical 
structure is changed. 

Fig. 13 shows the hierarchical structure of a 
5 pattern identification apparatus according to the 
fourth embodiment. The outline of the pattern 
identification method will be described with reference 
to Fig. 13. A data input unit 131 inputs data for 
identifying patterns. The input data is basically 

10 processed from the left side to the right side in 
Fig. 13. Features are gradually extracted from 
low-order features to high-order features, and an 
ultimate high-order feature is extracted. 

A feature extraction layer 132 has at least one 

15 feature extraction plane 133. The feature extraction 

plane 133 includes a large number of feature extractors 
and extracts a predetermined feature using the 
extraction result of another coupled feature extraction 
plane. The feature extractors within one feature 

20 extraction plane have identical structures and extract 
the same type of features. This feature extractor 
basically extracts a local feature. The predetermined 
features are topologically extracted from the input 
data by a large number of feature extractors within one 

25 feature extraction plane. 

The features extracted in a normal feature 
extraction plane are used for feature extraction in a 



P204-0456WO 

- 48 - 

feature extraction layer located immediately succeeding 
the normal feature extraction plane. However, as shown 
in Fig. 13, features extracted by a reuse feature 
extraction plane 133a are used in feature extraction 
5 not only for the layer located immediately succeeding 
the plane 133a but also for a high -order feature 
extraction layer. 

A non- hierarchical feature plane 133b inputs a 
feature except features hierarchically extracted from 

10 the input data. For example, the non-hierarchical 

feature plane 133b inputs, as a feature, information or 
the like from a sensor except the input data sensor. 

An intra-layer reuse feature extraction plane 
133c extracts a feature used in another feature 

15 extraction plane 133d within the same layer. In this 
embodiment, feature extraction is performed using the 
features extracted previously within the seune layer. 
However, after feature extraction is performed in a 
higher-order layer ^ feature extraction may be performed 

20 in a lower-order layer using the extraction result of 
the higher -order layer. 

With the above processes , features are gradually 
extracted from the input data in the order of low- order 
features to high-order features, and desired feature 

25 extraction is finally performed to identify the input 
data pattern. 

Figs. 14A and 14B are views showing the outline 
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of a result Integrating process, according to this 
embodiment. A feature extraction plane 133 Is 
identical to that shovm in Fig. 13. A feature 
extractor 14 is the one described with reference to 
5 Fig. 13. The feature extractors 14 generate outputs 
(likelihoods of features corresponding to positions) 
Output (x) as the feature extraction result. 

The outline of the result integrating process 
will be described with reference to Fig. 14A. Each 

10 feature extractor 14a is an excitation or repression 
feature extractor. Each feature extractor 14b gives 
excitation, while each feature extractor 14c gives 
repression. These feature extractors 14 extract 
different features at the same position of the input 

15 data. 

A feature extracted by the excitation or 
repression feature extractor 14a has a higher 
similarity to a feature extracted by the excitation 
feature extractor 14b, but has a low similarity to a 

20 feature extracted by the repression feature extractor 
14c. A value obtained by multiplying an output 
Output (r) from the excitation feature extractor 14b by 
a predetermined weight a is added to an output 
Output (q) from the excitation or repression feature 

25 extractor 14a. A value obtained by multiplying an 

output Output (p) from the repression feature extractor 
14c by a predetermined weight p is subtracted from the 
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output Output (q). These integrating processes make it 
possible to reduce identification errors at low 
processing cost. 

The outline of the result integrating process 
5 will be described with reference to Fig. 14B. A 

virtual feature extraction plane 15 includes a large 
number of virtual feature extractors 16. Feature 
extractors 14e and 14f in Fig. 14B are feature 
extractors used for integration. The virtual feature 
10 extractor 16 is an integrated virtual feature 
extractor. Features extracted by the feature 
extractors 14e and 14f used for integration are of the 
Scune type but have different variance levels (e.g., 
sizes ) . 

15 An output Output (q) from the integrated virtual 

feature extractor 16 is the average value of outputs 
Output (r) and Output (p) from the feature extractors 14e 
and 14f used for integration or a sum of the outputs 
Output (r) and Output (p) weighted by predetermined 

20 weighting coefficients. This result integrating 
process makes it possible to achieve strong 
identification against the variance of the input 
pattern at low processing cost. 

Note that the above embodiments can be properly 

25 combined and practiced. 

According to each embodiment described above, it 
is possible to perform pattern recognition capable of 
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robust Identification for the variances of an input 
pattern^ and reducing the processing cost while 
decreasing the possibility of Identification errors. 
In the embodiments as described above. It Is 
5 possible to perform pattern recognition capable of 
robust Identification for the variances of an Input 
pattern, and reducing the processing cost while 
decreasing the possibility of Identification errors. 
<Other Embodiments by, e.g., Software> 

10 The present Invention can be applied as part of a 

system constituted by a plurality of devices (e.g., a 
host computer. Interface device, reader, and printer) 
or as part of a single apparatus (e.g., a copying 
machine or facsimile apparatus). 

15 Also, the present Invention Is not limited to the 

apparatuses and methods which Implement the above 
embodiments, and to a method performed by combining the 
methods explained In the embodiments. That Is, the 
scope of the present Invention also Includes a case In 

20 which the program code of software for Implementing the 
above embodiments Is supplied to a computer (or a CPU 
or MPU) of the system or apparatus described above, and 
this computer of the system or apparatus Implements the 
embodiments by operating the various devices described 

25 above In accordance with the program code. 

In this case, the program code Itself of the 
software Implements the functions of the above 
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embodiments, and the program code Itself and a means 
for supplying this progreun code to the computer, more 
specifically, a storage medium storing the progreun code, 
come within the scope of the present invention. 
5 As this storage medium storing the progrcun code. 

It is possible to use, e.g., a floppy (R) disk, hard 
disk, optical disk, magnetooptlcal disk, CD-ROM, 
magnetic tape, nonvolatile memory card, or ROM. 

The program code also falls under the scope of 

-10 the present Invention not only in a case in which the 
computer implements the functions of the above 
embodiments by controlling the various devices in 
accordance with the supplied program code, but also in 
a case in which the program code Implements the above 

15 embodiments in collaboration with, e.g., an OS 

(Operating System) or another application software 
running on the computer. 

Furthermore, the scope of the present invention 
also includes a case in which the supplied program is 

20 stored in a memory of a function expansion board of the 
computer or in a memory of a function expansion unit 
connected to the computer, and a CPU or the like of the 
function expansion board or function expansion unit 
Implements the above embodiments by performing part or 

25 the whole of actual processing in accordance with 
instructions by the program code. 

Fig. 12 is a view showing an example of the block 
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configuration of an information processing apparatus 
which implements the present invention. As shown in 
Fig. 12, in this information processing apparatus, a 
CPU 1201, ROM 1202, RAM 1203, HD (Hard Disk) 1204, CD 
5 1205, KB (KeyBoard) 1206, CRT 1207, camera 1208, and 
network interface (I/F) 1209 are connected via a bus 
1210 so that they can communicate with each other. 

The CPU 1201 controls the operation of the whole 
information processing apparatus by reading out process 
10 programs (software programs) from the HD (Hard Disk) 
1204 or the like, and executing the readout programs. 

The ROM 1202 stores programs and various data 
used in the programs • 

The RAM 1203 is used as, e.g., a working area for 
15 temporarily storing process programs and information to 
be processed, in order to allow the CPU 1201 to perform 
various processes . 

The HD 1204 is a component as an excimple of a 
large -capacity storage, and saves, e.g., various data 
20 such as model data, and process progreuns to be 

transferred to the RAM 1203 and the like when various 
processes are executed. 

The CD (CD driver) 1205 reads out data stored in 
a CD (CD-R) as an example of an external storage, and 
25 writes data in this CD. 

The keyboard 1206 is ah operation unit by which a 
user inputs, e.g., various instructions to the 
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Information processing apparatus. 

The CRT 1207 displays various pieces of directive 
Information to a user, and various pieces of 
Information sucli as character Information and Image 
5 Information . 

The camera 1208 senses an Image to be Identified, 
and Inputs the sensed Image. 

The Interface 1209 Is used to load Information 
from the network, and transmit Information to the 
10 network. 

As many apparently widely different embodiments 
of the present Invention can be made without departing 
from the spirit and scope thereof. It Is to be 
understood that the Invention Is not limited to the 
15 specific embodiments thereof except as defined In the 
appended claims . 

CLAIM OF PRIORITY 
This application claims priority from Japanese 
20 Patent Application No. 2003-417973 filed on December 
16, 2003, which Is hereby Incorporated by reference 
herein . 



